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Abstract 

It is well-known that the Gutenberg-Richter power law distribution has to be modi- 
fied for large seismic moments, due to energy conservation and geometrical reasons. 
Several models have been proposed, either in terms of a second power law with a 
larger 6-value beyond a cross-over magnitude, or based on a "hard" magnitude cut- 
off or a "soft" magnitude cut-off using an exponential taper. Since the large scale 
tectonic deformation is dominated by the very largest earthquakes and since their 
impact on loss of life and properties is huge, it is of great importance to constrain as 
much as possible the shape of their distribution. We present a simple and powerful 
probabilistic theoretical approach that shows that the Gamma distribution is the best 
model, under the two hypothesis that the Gutenberg-Richter power law distribution 
holds in absence of any condition and that one or several constraints are imposed, 
either based on conservation laws or on the nature of the observations themselves. 
The selection of the Gamma distribution does not depend on the specific nature of 
the constraint. We illustrate the approach with two constraints, the existence of a fi- 
nite moment release rate and the observation of the size of a maximum earthquake in 
a finite catalog. Our predicted "soft" maximum magnitudes compare favorably with 
those obtained by Kagan [1997] for the Flinn-Engdahl regionalization of subduction 
zones, collision zones and mid-ocean ridges. 



1 Introduction 



The Gutenberg-Richter law states that the number N(m) of earthquakes with mag- 
nitude > m is 

log 10 N(m) = a — b m , with b « 1 . (1) 

This relationship is best understood by transforming it into a power law distribution 
for the scalar seismic moment M (expressed in Nm) 

a M M b 
P(M)dM = dM , with n = - , for M t < M < +oo , (2) 

using the relation 

m = i[log 10 M-9] , (3) 

where (3 is generally taken equal to 1.5. M t is a lower seismic moment cut-off. 

It is usually asserted that this self-similar law (^) cannot be true for M = +oo 
because it would require that an infinite amount of energy be released from the Earth's 
interior [Wfyss, 1973; Knopoff and Kagan, 1977; Kagan, 1994], since the mathematical 
expectation or average (M) = dM M P(M) = +oo. 

However, there is a subtlety not always appreciated in the literature that warrants 
clarification. In fact, the statement that the self-similar law (0) cannot be true for 
M = +oo due to the infinite expectation is not correct. The point is that, notwith- 
standing an infinite mathematical expectation, the measurement of the cumulative 
sum of seismic moments over any finite time interval T will always give a finite result! 
Thus, the infinite mathematical expectation does not require an infinite amount of 
cumulative energy released from the Earth's interior and does not contradict any 
conservation law. However, the infinite mathematical expectation is associated to 
the prediction that the cumulative sum of seismic moments should increase typically 
like the power 

j>i/n = T 3/2 of the time interv£ j T 

over which it is measured. In 
other words, the rate of seismic moment release is predicted to increase with time as 
Tm , with in addition huge fluctuations or jumps when a next biggest event occurs 
(which is precisely the origin of the acceleration of the rate). To sum up, the relevant 
question is not that of a finite or infinite expectation but rather whether the rate 
of cumulative growth of seismic moment is constant or increasing with observation 
time. Until this is completely settled observationally, the fmiteness of the mathe- 



matical expectation of the seismic moments has the status of an hypothesis, however 
plausible it may be. 

A number of authors have remarked that the change from the two-dimensional 
character of the fracture surface of small earthquakes to its one-dimensional character 
for large earthquakes should lead to a modification of the exponent of the Gutenberg- 
Richter law from a value /i ~ 2/3 to a value larger than one (thus ensuring convergence 
of the mean seismic rate), while still keeping a power law shape [Kanamori and 
Anderson, 1975; Rundle, 1989; Romanowicz, 1992; Pacheco et ai, 1992; Romanowicz 
and Rundle, 1993; Okal and Romanowicz, 1994; Sornette and Sornette, 1994; Sornette 
et ai, 1996]. In these models, the cross-over between the two power laws is a measure 
of the thickness of the seismogenic zone in the case of strike-slip earthquakes and of 
the downdip dimension of rupture in the case of earthquakes in subduction zones. 

Another model assumes a sharp cut-off at a maximum moment M max such that 
absolutely no earthquakes are to be expected with M > M max [Anderson and Luco, 
1983], a version of which is used for instance in [WGCEP, 1995] for the assessment 
of seismic hazards in southern California. 

Kagan [1993, 1994, 1997] and Main [1996] have advocated a different "soft" cut- 
off according to which the power law is multiplied by an exponential roll-off at large 
moments : 

C M 

P xg (M) dM = exp[-— ] dM , for M t < M < +oo , (4) 

where C is a constant of normalization. We will refer to this expression as a Gamma 
law. Beyond M xg , P xg (M) decays much faster than the pure Gutenberg- Richter law 
P(M). However, this is a "soft" cut-off since moments larger than M xg are possible 
but with smaller and smaller probabilities compared to the "pure" Gutenberg- Richter 
power law. Quantitatively, the probability to exceed the corresponding magnitude 
m xg by a small fraction of a unit magnitude is very small. For instance, the probability 
to exceed M xg is J^ xg P xg (M) dM = 0.11 (M xg /M t )~ IJ ' (for /i = 2/3), i.e. is smaller 
by a factor of about ten than the probability predicted from the extrapolation of the 
pure power law distribution (|2]). The probability to exceed 7M xg , i.e. a magnitude 
m = m xg + 0.6 is j™ Mxg P xg {M) dM = 7 10~ 5 (7M xg /M t )~» (for y. = 2/3), i.e. 
is about ten thousand times less probable than the probability predicted from the 
extrapolation of the pure power law distribution (0). 



Since the tectonic seismic moment balance is dominated by the very largest earth- 
quakes and since their impact on loss of life and properties is huge, it is of great 
importance to constrain as much as possible the shape of their distribution. Here, 
we present a simple but powerful and general probabilistic theoretical approach that 
shows that the Gamma distribution is indeed the best model, under the following 
hypothesis : 

1. the unconditional Gutenberg- Richter power law distribution (|2|) is supposed to 
hold in the absence of any constraint. This assumption cannot be verified in 
an absolute sense by any finite set of observations which may present inherent 
constraints. 

2. One or several constraints are imposed, either based on conservation laws (to- 
tal observed tectonic deformation rate) or on the nature of the observations 
themselves (maximum observed magnitude in a finite time window). 

Our approach uses a maximum likelihood formulation which can be interpreted using 
information theory as a minimization of the information added from the specification 
of the constraints, given an earthquake catalog. In a sense, it has similarities with the 
approach of Main and Burton [1984], whose goal was to derive the Gutenberg-Richter 
distribution from the maximization of an information entropy. While the general Von 
Neumann/Shannon information entropy principle is solidly based on fundamental 
probability theory, the choice of the specific entropy made in Main and Burton [1984] 
obviously controls the (power law) nature of the solution. This is spelled out in detail 
by Shen and Mansinha [1983], who show that the probability distribution generated 
from the principle of maximum entropy will depend on the (arbitrary) choice of the 
independent variable and on the choice of the prior distribution representing our 
complete ignorance. This prior distribution is shown to be inversely proportional to 
the measurement errors, that in general are hard to estimate reliably. 

In contrast, we do not claim that the Gutenberg-Richter law derives from such 
a principle and we do not attempt to derive it in any other way (see Main, 1996; 
Grasso and Sornette, 1998 for reviews of proposed physical mechanisms behind the 
Gutenberg-Richter law). Our goal is to quantify the distorsion imposed on the 
Gutenberg-Richter law by the constraints. 



We first present the method in general terms and exemplify it on the simple case of 
a dice play. We then apply it to the Gutenberg-Richter law and analyze two different 
types of constraints, namely the existence of a finite moment release rate and the 
observation of the size of the maximum moment in a finite catalog. 



Formally, the question we address is how to find a probability distribution satisfying 
the two conditions 1 and 2 of the previous section. 

As pointed by V. Pisarenko (private communication), a natural way to address 
this question is to use some functional R(P] P x ) measuring the "distance" between 
the Gutenberg-Richter density P and the desired modified density P x . Then, one 
can minimize this distance R(P] P x ) under the given conditions. The problem is that 
there is a certain degree of arbitrariness in the choice of the distance R(P; P x ), that 
lead to different solutions. 



If one takes the Kullback Distance 1 representing average log-likelihood 
ln[P x (v)/P(v)\ of P x against P [Kullback, 1958] 



one gets directly (using the Lagrange multiplier method) the Gamma distribution 



where the constants a and b are determined by the constraints. 

However, there are other "distances" that are a priori as justifiable as the Kullback 
Distance 1 and which lead to different results. The Kullback Distance 2 is the average 
log-likelihood of P against P x 



2 Nature of the problem 





P x (v) = P(v) e' 



a—bv 



(6) 





which leads to the modified solution 



Px(v) 



P(v) 



(8) 



a + bv 



where the constants a and b are again determined by the constraints. 



Another example is the Kullback Distance 3, quantifying the "divergence" be- 
tween P and P T 



where G(z) is the inverse function of g(z) = z + Inz. 

Thus, as a consequence of the existence of some degree of arbitrariness in the 



unique solution of the two conditions 1 and 2 of the previous section. 

Our approach provides a way to avoid the arbitrariness in the choice of distance 
between P and P x , based on fixing the random sample mean of observed events. This 
constraint could be seen as too restrictive, since it selects among all realizations of 
possible sequences of earthquakes only those where the sample mean is exactly equal 
to a specified value. Our point is that this constraint is fixed to a specific value by 
an independent global measurement of the cumulative strain obtained by geodetic or 
satellite techniques, thus providing an estimation of the cumulative released moment 
(neglecting for the time being difficulties associated with the tensor nature of the 
problem). Thus, we propose to condition the modified Gutenberg- Richter distribution 
on those specific random realizations that are consistent with the global measurement. 
This rather specific constraint would not apply in all circumstances. 

We now turn to a presentation of this approach. 

3 Frequencies conditioned by constraints 

Let Vx, i> 2 , f n -i, v n be the different possible values of a random variable distributed 
according to a given a priori density distribution P{v). In the earthquake case, 
if magnitudes are given with a precision 0.1, they can only take values such as 
...,5.6,5.7,5.8,5.9,6.0,6.1.... In general, experimental or observational data are al- 
ways coarse-grained at a given resolution level, thus making the measurements dis- 
crete. In the sequel, we use discrete notations, as it is straightforward to write our 
results in a continuous framework by simply taking the limit of infinite resolution. 




(9) 



which leads to the following solution 




(10) 



choice of distance R(P; P x ), this approach does not select the Gamma law (|6]) as the 



A given catalog consists of N events. Each event, say the l-th one, has a value v\ 
taken from the pool of the n possible values. Then, the classical law of large numbers 
[Gnedenko and Kolmogorov, 1954] states that the frequency f(v{) with which one 
measures a given value V\ among n possible values of the same random variable 
converges towards its a priori probability P{yi) as N — > oo. This idea is in fact at 
the basis of the frequency concept of probability. 

Now, suppose that the measured mean of these N realizations deviates from the 
theoretical mean. What can we say about the frequency of each value vp. More 
generally, suppose that we possess additional observational constraints. What are 
the consequences for the relative frequencies of the events? 

This question has a well-defined mathematical answer in the limit of large N. In 
general, the frequency of a given value v\ converges to a well-defined number in the 
limit of large iV but this number f(vi) is different from the probability P(v{)\ The 
fundamental reason for this result is that there is a close relationship between the 
existence of the deviation of the mean from its theoretical value and the existence of 
frequencies that are different from their theoretical probabilities. Probability theory 
allows one to compute precisely this anomalous behavior in the limit of large N. Our 
results also hold for finite N. Technically, this analysis belongs to the so-called large 
deviation regime [Lanford, 1973; Frisch, 1995]. 

Let yi, i/N be the N observed values that can take one of the n possible values 
i>i, v n . We denote the observed mean V by 

^4f>> (ii) 

JV 3=1 

which can also be written 

n 

V = '£fiv l , (12) 
i=i 

where fi = j±is the observed frequency of the value vi, Ni is the total number of 
realizations and N is the total number of events. In expression (|i~2"D , the N realizations 
have been partitioned into groups of identical values, so that the first group contains 
Ni variables each equal to v\, the second group contains N 2 variables each equal to 
t>2, and so on. Therefore, N\ + N 2 + ... + N n = N. 

The law of large numbers states that fi = j±—> P(v{), when N — ► oo. For an 
observed value of V = x, what can we say about /;? More precisely, what are the 



values taken by ft, conditioned by the observation of V = x? In the earthquakes ap- 
plication, the constraint on the average may for instance reflect the long-term tectonic 
deformation balance. A by-product of our calculation will be an assessment of the 
impact on the earthquake frequency distribution of an error made in the estimation 
of the long-term tectonic deformation or of its possible non-stationarity. 

To solve this problem, we first estimate the probability to observe the frequencies 
/ij /2j •••) fn from a total of N realizations of the random variable : 

AH ™ 

- (Nh) , NU) ,., NUy . n w , d3) 



where Nfi = N t and (Nf t )\ is Nfi(Nfi - l)(Nfi - 2). ..4.3.2.1. To write (gj), we have 
used the assumption that the events are independent. Using the Stirling formula 
In AH « NlnN - iV to expand (Nfi)\, we find 

/'(/.../, !,,)-< NH{h > h w , (14) 

where 

n r 

H(f 1 J 2 ,...J n ) = J2fi lo S^ ( 15 ) 

is often called neguentropy (the negative of the entropy). For N large, the law 
of large numbers states that the frequencies f converge towards the values that 
minimize H(fi, f 2 , f n ) in the presence of constraints (i.e. such that P(/i, f 2 , f n ) 
is maximum). Note that, while f\, f 2 , f n are random values, they converge to well- 
defined non-random values f(vi), f(v 2 ), f(v n ) for — > +oo. 

In the absence of constraints, the normalization condition J2?=i fi — 1 reduces 



( fl^) to the law of large number 

fi - Piyi) ■ (16) 

However, if we observe V = Ya=i fi v i = x i the maximum likelihood frequencies 
are those that minimize the function 



r 1 



H(fl, h, ■■; fn) - Al 



fi - 1 ~ A 2 fa 



X 



1=1 



(17) 



W=l 

where Ax and A2 are two Lagrange multipliers. The technique of Lagrange multi- 
pliers allows one to solve an optimization problem in the presence of constraints by 
incorporating them in the function to be minimized. Then, the solution depends on 
these factors, which are then eliminated by imposing the constraints on the solution 



[Bertsekas, 1982]. In the present case, the two constraints associated to Ai and A2 
are the normalization of the /j's and the observation that V = x. The solution is 

fl - f(vi) = P(vi) , (18) 

where (3{x) = — A2, which is determined as a function of x from 
where the "partition function" Z((3) is defined by 

n 

Z((3) = Y,P(vi)e-^. (20) 
i=\ 

The expression (0) gives the frequencies of the values of the random variables vi, 
conditioned by the existence of a constraint on the mean. Notice that the case 
f3 — retrieves the unconstrained case. We thus expect that the behavior of Z(/3) 
close to (3 = controls most situations where the constraints introduce only minor 
perturbations in most of the distribution. 

It is not fortuitous that the expressions (p!8|-^0|) bear a strong similarity with the 
statistical mechanics formulation [Rau, 1997] of systems composed of many elements, 
where Z(f3) is the partition function, (3 is the inverse temperature and —\ogZ{(3) is 
proportional to the free energy. The fact that the constraint is seen as a "high tem- 
perature" (/3 — > 0) perturbation is clear : the constraint is analogous to an "energy" 
added to the "free energy" — log Z(j3), which in the absence of constraint is solely 
controlled by the "entropy" H . The relative importance of entropy and energy is 
weighted by the temperature, the entropy dominating at high temperatures. 

Let us illustrate this result flI5|) on a dice game. Suppose that a dice with six faces 
is thrown N times and that one counts the frequencies //, I = 1 to n = 6 with which 
each of the six faces of the dice occur. According to the law of large numbers, the six 
fi tend to 1/6 ~ 0.166 for large N and the mean /1 +2/2 + 3/3+4/4+5/5 + 6/6 tends to 
3.5. Let us now assume that we observe a mean x — 4. The formula ([Tj|) predicts that 
the frequencies that contribute to this deviation are not the same anymore. We get 
/1 w 0.103, / 2 w 0.123, / 3 w 0.146, / 4 « 0.174, / 5 w 0.207 and / 6 w 0.247. In other 
words, the five and six occur about twice as much as the one. Intuitively, the large 
values become more frequent because the outcomes are biased by the observation of 
a larger mean. 



We stress that this probabilistic approach does not explain why the constraint 
exists or why there is a deviation from the mean. It simply draws the best conclusions 
on the frequencies that are compatible with the existence or observation of such a 
constraint. 

4 Application to earthquakes 

4.1 Constraint of a finite total moment release : theory of 
the Gamma distribution 

Consider the normalized power law distribution P(v) 

P(v) dv = -4- dv , (21) 

where v = M/M t is the normalized seismic moment and \x ~ 2/3. From this normal- 
ization, the integral from 1 to oo of P(v) is equal to 1. 

It is convenient to return to continuous notation, for which expression (P0| ) for 
Z((3) reads 

m = l d,— . (22) 

In the appendix, we show that for large x, i.e. small (3 

Z(J3) = , (23) 

with 

d li = T(l- f i) , (24) 
where T denotes the Gamma function. Condition (|i~9|) leads to 

Wl) = (ffl_^))^. (25 ) 

The observed frequency of earthquake size is thus 

-/3 a (x)vi 

f l ^f{v l ) = C{ii 1 x)^^ ri (26) 

v i 

where 



C(ji,x)=fi exp(r(l-^)^ a (x)]") . (27) 



The expression ( fffi) recovers the Gamma distribution @ postulated by Kagan [1993, 
1994, 1997]. Using v = M/M t , we rewrite (H) as (D where the soft "cut-off" M xg is 
given by 



Mt ( x \ i-M 



Let us translate this formula into geologically meaningful quantities. Here, we 
follow the notation of Kagan [1997] and will test our results against his. The quantity 
x can be expressed as a function of the geological rate M of deformation (in Nm/year) 
of the region under consideration and of the yearly number at = N/ At of events with 
moments above the threshold M t (taken from the Harvard catalog over its lifespan 
At = 18.5 years) as follows 

M At 

x = —— . 29 

M t N y ' 

Using p = 2/3, which we shall assume fixed, we get from (p8|) 

M xg = M t (^) 3 , (30) 



and 



x 

m xg = m t + 2 log 10 , (31) 



where m xg is the magnitude corresponding to M xg and m t is the magnitude corre- 
sponding to the moment threshold M t . This equation is essentially the same as the one 
used by Kagan [1997]. In particular, (|31~1) with (|29|) shows that m xg = 2 log 10 M+ con- 
stant, in agreement with Kagan [1997, his equation (9)] for /i = 2/3. The only dif- 
ference between our equation and Kagan's equation (9) is in the additive factors and 
in our fixed choice for \x = 2/3. Fixing fi is justified by the fact that some catalogs 
are very short and it is reasonable to limit as much as possible the number of free 
variables. From Q3"TD, we see that an error on x, i.e. on M, of a factor two results 
into an error of 0.6 in the "soft cut-off" magnitude m xg . 

Following Kagan [1997], we take m t = 5.8 corresponding to M t = 10 17 ' 7 Nm = 
0.5 10 18 Nm. The results are summarized in the Tables 1-3 for subduction zones, 
collision zones and mid-ocean ridges. We compare our estimation for the "soft cut- 
off" magnitude m xg for the Gamma law derived from our analysis (|3T| ) with the 
corresponding values obtained by Kagan [1997] for the Flinn-Engdahl seismic regions. 
We find very good agreement with the estimations of Kagan [1997] for the subduction 
zones and collision zones. Our estimation of m xg for mid-ocean ridges remains in the 



range 8 — 9 also in agreement with Kagan [1997] who quotes the value m xg = 8.7, 
assuming /i = 0.63. Our results for mid-ocean ridges are however less sensitive to 
large fluctuations to unphysical values [Kagan, 1997]. 

The expressions ( f28| , |3~TD with ( p9"D show that M xg and m xg increase with x, i.e. 
with the constraining geological rate of deformation M. If M goes to infinity (the 
constraint no longer exists), we recover the pure Gutenberg- Richter distribution with 
M xg — > oo. The relation M xg ~ M~ obtained from (fffi) follows from this simple 
argument. M ~ N(M) is the sum of N earthquake contributions, where (M) is 
the average seismic moment released per earthquake. For a power law distribution 
with an effective truncation at M xg , (M) ~ JjJJ s dM M/M 1+ ^ ~ M\^ 1 . Inverting 
M ~ M*~ M , we get the dependence of M xg as a function of M in (|28|). 

Is the statistical estimate of the curvature of the Gutenberg- Richter law based on 
ten or twenty earthquakes reliable? In our experience, we have found that formulas 
based on asymptotic large limits often work satisfactory even for remarkably small 
systems that would a priori rule out their validity. Let us mention, for instance, 
continuous hydrodynamics equations that work in superfluid helium flows for fluid 
layer thicknesses equal to a fraction of the size of one helium atom (see for instance 
Noiray et al. [1984]). Another example is the liquid-solid transition of clusters of 
atoms (see for instance Matsuoka et al. [1992]) that has essentially all its infinite 
limit thermodynamic properties as soon as the number of atoms is larger than a few 
tens. There are many other examples where extrapolation of asymptotic formulas 
valid for large statistics provide (good) surprises in the small statistical limit. 

4.2 Constraint of a finite total moment release and of a max- 
imum size 

Let us now assume that the average seismic moment release is again V — x (in 
normalized units) and in addition no observations have shown v > v max . In other 
words, M max = M t v max is the largest seismic moment observed in the catalog. 
How are the observable frequencies f(v) deviating from the "pure" pdf P(v) (still 
assumed to be the pure power law (|2lD )? One could argue that this constraint is 
not natural since it might simply result from the artifact of a limited catalog. But 
this is precisely the question we ask: conditioned by the absence of v's larger than 
Vmax, what is the implication for the distorsion of the derived distribution? Here, 



we are simply pointing out that as long as a "great" earthquake does not occur, this 
may lead to the false estimation that the distribution is truncated, while in fact the 
truncation, if any, occurs at larger unsampled values. 

The answer is obtained by following the steps in the previous section. This leads 
to the modified expression of the partition function 

-Pv 

""" dv ^ 

/l 

This can be written as 



m = I. dv^r- (32) 



oo /, p-Pv roo n p~/3v 



m = i ^ ^— - / dv p ( 



! v 1+ V Jv Tji+M 

OO ii p—0 v rOO it p—0 Vmax V 

, x ^^r-K-l" M / dv ■ ( 33 ) 

Proceeding as in the previous section and using the result in the appendix, we get 
from (|D 

Z{p) = e- d ^ - [v max ]-» e-^I^m..]" . (34) 

The inverse "temperature" (3 is again given by (|19|). 
Two cases must be considered. 

1. /3 a (x) v rri ax > 1 : this condition is the same as M max > M xg , i.e. the largest 
observed seismic moment is larger that the "soft cut-off" M xg of the Gamma 
distribution. In this case, the second term in (El) can be neglected and we 



recover the previous results (pq) . The impact of M max is negligible. 



2. (3 a (x) v max < 1 : the observation of the maximum observed magnitude will 
modify the observed Gutenberg-Richter law as we now calculate. 



In the case (3 a (x) v max < 1, the simplest approach is to expand ([34]) in powers 
of up to second order. Then, as the first-order cancels out between the first and 
second term in the r.h.s. : 

z(p) « (i - \v max m - i v» max ^ , (35) 

which yields 



dp 



V dl v^ ax (3 2 ^ . (36) 



Equating to x according to the equation (|T^) yields 



U[r(i-//)]2^ J (4.784 tc) ' (37) 

using /i = 2/3. Inverting, we get the modified "soft cut-off" seismic moment entering 
into the Gamma distribution 

2 

M™ ax = M t ( L7S J''. (38) 

This has a dependence in x which is the inverse of the previous case (|30|) where the 
impact of the observation of the largest earthquake is not felt. 

As a case study, assume that the "soft cut-off" magnitude m xg is about 8.5. Using 
(|3l|), this corresponds to a value x = 40. This magnitude is the most probable value 
found in the results shown in the Tables 1-3 as well as by Kagan [1997]. Let us 
assume that we have only a limited catalog and that the largest earthquake in this 
catalog has a magnitude m max = 7.5, which corresponds to v max = M max /M t = 355. 
This example corresponds to Southern California with the largest earthquake in the 
southern earthquake catalog being the Kern county 1952 earthquake. Introducing 



these values x = 40 and v max = 355 in pi) yields M™ ax = 1.08 10 20 Nm, i.e. 



m ™g X — 7.36. This theory thus predicts a slight bending down of the Gutenberg- 
Richter law at a value slightly smaller than the maximum observed magnitude. This 
occurs when this later value is significantly less than the "soft cut-off" magnitude 
m xg solely deduced by the geological tectonic deformation rate. In this example, the 
number of earthquakes of magnitude close to 7.4 is about one third the number that 
would be extrapolated from the pure Gutenberg-Richter power law. If the maximum 
observed magnitude is m max = 7.0 corresponding to v max = 316, we get M™ ax = 
0.86 10 20 Nm, i.e. m™ g ax = 7.28. 

4.3 Other constraints 

Another example is motivated by the recent debate as to whether there is a deficit in 
intermediate size earthquake in southern California since 1850 [WGCEP, 1995; Stein 
and Hanks, 1998]. One can estimate the modified Gutenberg-Richter law induced 
by both this deficit and the constraint of a finite moment release by using the same 
technique as described above. Let us briefly indicate how to proceed. In the simplest 



version, we consider a total deficit between v\ and v 2, i.e. no events of size v 1 < v < V2 
occurred. We still keep the constraint of a given average moment release. The solution 
is obtained by following the same steps as in the previous section, which lead to the 
modified expression of the partition function 



vi n p~P v roo a p-Pv 



m=l dv^—+ dv^—, (39) 



] V 1+ V Jv 2 V 1+ V 



which can also be written as 



Z W) = I dv ^— - - / dv + dv' 



1 V 1+ V J m V 1+ ^ Jv2 V 1+ ^ 

00 /, P —P v roo a p~P vi v roo . „—P V2 v 

dv ^— [«!]-" / dv^— + dv^— . (40) 

1 v 1+ ^ 1 J Ji v 1+ v 1 J Ji v 1+ ^ v ' 



Using the result of the appendix, we get from ( f23f ) 

Z(f3) = e~W - e - d » [vi ^ + [V2]-" e- d » [v2 ®" . (41) 



The inverse "temperature" (3 is again given by ([19]). Different cases can then be 
analyzed as a function of the relative influence of the values v%, v<i and the global 
geological deformation rate x. 

5 A physical derivation of the Gamma distribution 

In this section, we complement the analysis by proposing a simple physically-based 
derivation of the Gamma distribution. Our previous considerations have been of a 
probabilistic nature. It is useful to extend our intuition by identifying the structure of 
physical models that are compatible with a Gamma distribution. The simple model 
discussed now shows that the self-similarity and homogeneity conditions, together 
with a self-consistent cascade mechanism, lead to the Gamma distribution. 

In the spirit of reaction rate theory and in analogy with various approaches to 
model fragmentation processes [Cheng and Redner, 1988; Brown and Wohletz, 1995], 
we posit that the number of earthquakes of "energy" v is the solution of the following 
self-consistent integral equation : 

r+00 

P(v) =C dv' P{v') f(v' -> v) , (42) 



where C is a constant and f(v' —>■ v) is the distribution of event sizes v arising from a 
cascade of earthquakes triggered by the event of size v'. This model envisions that the 



crust self-organizes into a state where events are correlated, each of them being able to 
trigger a set of smaller earthquakes (aftershocks). The lower bound v in the integral 
express that earthquakes are preferentially triggered by larger preceeding events. We 
assume that f(v' — > v) is a power law of v'/v with exponent 1 + fi (f(v' — ► v) — 
(V/f) 1+M ). Then, (1) the power law assumption will lead to the Gutenberg- Richter 
distribution; (2) the homogeneous dependence in v'/v is the simplest law compatible 
with self-similarity. The solution of ( f42"|) is found to be the Gamma distribution : 

P(v) =PoJ^ , (43) 

where Pq is a normalizing constant. 

The general class of branching models [Vere- Jones, 1977] provides a simple con- 
crete geometrical implementation of this cascade model (f4"2"p. Branching models de- 
scribe the notion of a cascade that may either end after a finite number of steps 
or diverge, depending upon the value of a control parameter, the branching prob- 
ability. It has been applied to describe failure and earthquakes, seen as resulting 
from a succession of events chained through a causal connection [Vere- Jones, 1977]. 
The resulting distribution of event size is the Gamma distribution in the sub-critical 
regime. 

6 Concluding remarks 

We have shown how to formulate the effect of a global constraint on the observed 
Gutenberg-Richter law, using simple probability concepts. The remarkable result is 
that a constraint leads in general to a modification of the Gutenberg-Richter power 
law into a Gamma law, as advocated by Kagan [1993, 1994, 1997], with a "soft cut- 
off" magnitude controlled by the constraint. This provides a strong basis for the use 
of the Gamma distribution as a model of earthquake frequency-moment distribution. 

Technically, the reason for this result may be seen to lie in the fact that the 
logarithmic density of frequencies normalized by N given by expression fll5|) is exactly 
like the Kullback Distance 1 given by expression (|). In this sense, our approach 
proposes in this context an original way to solve for the a priori arbitrariness in the 
choice of the distance between distributions. One could thus state that the Gamma- 
distribution gets still another justification. 



This general approach can be used to study the impact on the Gutenberg- Richter 
law stemming from the existence of other observational constraints. We have thus also 
shown how to incorporate the observational constraint of the existence of a maximum 
observed magnitude and have outlined how the observation of a deficit of earthquakes 
in a certain magnitude window could also be tackled by this technique. 

We acknowledge useful discussions with D.D. Jackson, Y.Y. Kagan and G. Ouillon 
and especially illuminating correspondence with V.F. Pisarenko. We also thank I. 
Main as a referee for thoughtful remarks and J. Pujol as the associate editor for a 
careful reading of the manuscript. 



APPENDIX 

We wish to calculate Z((3) given by (|22|) . Let us be general and consider the case where 
H can be larger than 1. This situation has been argued for large earthquakes [Pacheco 
et al, 1992; Sornette et ai, 1996]. Denote I the integer part of \i (I < \i < I + 1). 
Integrating by part / times, we get 



(/j-i)0i-2)...0i-i) 

+ 7 V7 7 a / dxe~ x x l -» . (44) 



(/i-l)(/i-2)...(/i-/)^ 
This last integral is equal to 



oo 

p» I dxe~ x x 1 -^ = T(l + 1 - + /? m 7 * (/ + 1 - fi, p)] , (45) 
where T is the Gamma function (T(n + 1) = n! for an integer argument) and 

+oo an 

^ + 1 -^ e "£ r(/ + 2% + n ) < 46 > 

is the incomplete Gamma function [Abramowitz and Stegun, 1972]. We see that Z(/3) 
presents a regular Taylor expansion in powers of (3 up to the order I, followed by a 
term of the form (3^. We can thus write 

Z(f3) = 1 + n(3 + + n(3 l + + 0((3 l+1 ) , (47) 

with r\ = — (v), T2 = ^r-, For small (3, we rewrite Z((3) under the form 



Z(/3) = exp 



j2d k (3 k -d^ 

k=l 



(4S 



where the coefficient d^ can be simply expressed in terms of the r fc 's. In particular, 
we have 

d» = r(Z + 1 - n) . (49) 

The expression ( f4"8D generalizes the canonical form of the characteristic function of 
the stable Levy laws [Samorodnitsky and Taqqu, 1994] for arbitrary values of /i, and 
not solely for // < 2 for which they are defined. Levy laws can exist only for /j < 2 
and are stable upon convolution, i.e. their shape is unchanged up to a rescaling. 
The generalized expression (f4"8|) shows that the £azZ of a power law distribution also 



remains stable even for /x > 2. However, the tail slowly shrinks in size as the domain 
of validity of the power law tail for /i > 2 extends beyond a limit which slowly 
increases as ViVln iV with the number N of events. 

Note that the canonical form of the characteristic function of Levy laws is recov- 
ered for (j, < 2 for which the coefficient d,2 is not defined (the variance does not exist) 
and the only analytical term is (v)/3 (for /i > 1). 

For the earthquake application, /i < 1 and we thus obtain 

Z(f3) = e~ d ^ for small j3, i.e. large x . (50) 
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Table 1 

SUBDUCTION ZONES : Soft "cut-off" magnitude m xg for the Gamma law derived 



from equation (|31| ) and comparison to Kagan [1997] fits (indicated by [m xg ± a m ] a 
for Flinn-Engdahl seismic regions. N is the number of earthquakes in each region 
with seismic moment larger than the threshold M t . 



Seismic Region 


N 


M 


X 


[m xg ± cr m } a 


/ "I — 1 /I. *V I l\ \ 

m xg (Eq.©) 


A -I -I A 1 i ■ A 

Alaska-Aleutian Arc 


152 


1.80 10 2U Nm 


44 


8.53 ± 0.29 


8.58 


Mexico-Guatemala 


83 


0.84 10 20 Nm 


37 


8.41 ±0.30 


8.44 


Central America 


86 


0.88 10 20 Nm 


38 


8.42 ±0.30 


8.45 


Caribbean Loop 


31 


0.37 10 2U Nm 


44 


8.54 ±0.34 


8.58 


Andean S. America 


125 


3.00 10 20 Nm 


89 


9.09 ±0.29 


9.19 


Kermadec- Tonga- S amoa 


248 


2.10 10 20 Nm 


31 


8.27 ±0.28 


8.29 


Fiji Is. 


44 


0.81 10 20 Nm 


68 


8.88 ±0.31 


8.96 


New Hebrides Is. 


222 


1.70 10 2U Nm 


28 


8.19 ±0.28 


8.20 


Bismarck-Solomon Is. 


219 


1.74 10 20 Nm 


29 


8.22 ±0.28 


8.23 


New Guinea 


129 


3.00 10 20 Nm 


85 


9.07 ±0.29 


9.16 


Guam- Japan 


43 


1.02 10 20 Nm 


88 


9.08 ±0.32 


9.18 


Japan-Kamchatka 


227 


3.00 10 20 Nm 


49 


8.62 ±0.28 


8.67 


S.E. Japan-Ryukyu Is. 


22 


0.64 10 20 Nm 


117 


9.24 ±0.36 


9.35 


Taiwan 


52 


0.54 10 20 Nm 


38 


8.43 ±0.31 


8.46 


Philippines 


147 


1.25 10 2U Nm 


31 


8.27 ±0.29 


8.29 


Borneo- Celebes 


149 


1.47 10 20 Nm 


36 


8.39 ±0.29 


8.42 


Sunda Arc 


122 


2.30 10 20 Nm 


70 


8.90 ±0.29 


8.98 


Adaman Is. -Sumatra 


26 


0.94 10 20 Nm 


133 


9.41 ±0.35 


9.55 



Table 2 

COLLISION ZONES : Soft "cut-off" magnitude m xg for the Gamma law derived 



from equation (|31|) and comparison to Kagan [1997] fits (indicated by [m xg ± a m ] a 
for Flinn-Engdahl seismic regions. N is the number of earthquakes in each region 
with seismic moment larger than the threshold M t . 



Seismic Region 


N 


M 


X 


[m xg ± a rn ] a 


m xg (Eq.©) 


Burma-S.E. Asia 


20 


0.52 10 2U Nm 


96 


9.15 ±0.37 


9.26 


India-Tibet- Yunan 


29 


0.45 10 20 Nm 


57 


8.75 ±0.34 


8.81 


S. Sinkiang-Kansu 


20 


0.21 10 20 Nm 


39 


8.44 ±0.37 


8.47 


W. Asia 


50 


0.38 10 20 Nm 


28 


8.18 ±0.32 


8.19 


M.-E. -Crimea-Balkans 


37 


0.29 10 20 Nm 


29 


8.21 ±0.33 


8.22 


W. Mediterranean 


22 


0.15 10 20 Nm 


25 


8.10 ±0.36 


8.10 


Baluchistan 


10 


0.37 10 20 Nm 


137 


9.43 ± 0.45 


9.57 



Table 3 

MID-OCEANS RIDGES : Soft "cut-off" magnitude m xg for the Gamma law derived 
from equation ( |3"T| ) and comparison to Kagan [1997] fits (indicated by [m xg ± a m ] a 
for Flinn-Engdahl seismic regions. N is the number of earthquakes in each region 
with seismic moment larger than the threshold M t . 



Seismic Region 


N 


M 


X 


\P^xg ± 0"m] 


m xg (Eq.©) 


Baja-Calif-Gulf Calif. 


10 


0.17 10 2U Nm 


63 


12.81 ±2.45 


8.89 


Atlantic Ocean 


112 


0.67 10 20 Nm 


22 


8.28 ± 1.61 


7.98 


Indian Ocean 


94 


1.44 10 20 Nm 


57 


12.36 ± 1.63 


8.80 


Artie 


8 


0.15 10 20 Nm 


69 


13.24 ±2.64 


8.98 


S.E. & Antartic Pacific 


107 


2.38 10 20 Nm 


82 


13.98 ± 1.61 


9.12 


Galapagos 


16 


1.44 10 2U Nm 


332 


20.05 ±2.15 


10.34 


Macquarie Loop 


48 


0.43 10 2U Nm 


33 


10.03 ± 1.74 


8.33 



